Chemical space docking enables large-scale structure-based virtual screening to discover ROCK1 kinase inhibitors

With the ever-increasing number of synthesis-on-demand compounds for drug lead discovery, there is a great need for efficient search technologies. We present the successful application of a virtual screening method that combines two advances: (1) it avoids full library enumeration (2) products are evaluated by molecular docking, leveraging protein structural information. Crucially, these advances enable a structure-based technique that can efficiently explore libraries with billions of molecules and beyond. We apply this method to identify inhibitors of ROCK1 from almost one billion commercially available compounds. Out of 69 purchased compounds, 27 (39%) have Ki values < 10 µM. X-ray structures of two leads confirm their docked poses. This approach to docking scales roughly with the number of reagents that span a chemical space and is therefore multiple orders of magnitude faster than traditional docking.

with the goal to find one structure suitable to re-dock all 18 co-crystalized reference compounds into it with reasonable accuracy and associated affinity estimates. This kind of validation is crucial as throughout the process, the number of compounds to be docked reaches several millions. So, for practical reasons they can only be docked into one structure; but also, for statistical reasons, the best possible structure for scoring has to be chosen, as selecting just 100 from millions requires a close to perfect enrichment ratio of actives over inactives chosen.
To find the most suitable structure, the binding modes of the co-crystalized ligands in all 18 crystal structures were visually inspected. All of them share an H-bond interaction with the backbone NH of Met323. Several ligands interact additionally with the neighbouring backbone carbonyl of Glu321, forming a solid hinge binding motif. Further H-bonds are made with the gatekeeper or DFG-loop amino acids, further away from the hinge, which makes sense due to the drug-like size of the crystalized molecules. As in the first step of the Chemical Space Docking only fragment-sized molecules will be docked, placement constraints need to focus on one region only, as otherwise, the fragments may not be able to span the distance between them. Therefore, multiple alternative pharmacophoric constraints were defined, but all only in the hinge region: • Essential H-Bond-Acceptor opposite of the backbone nitrogen of Met323 • H-Bond-Donor opposite of the backbone carbonyl of Glu321 or H-Bond-Acceptor opposite of the backbone nitrogen of Met323, but at least one of them needed to be fulfilled • No constraints. To find a good docking setup, the 18 co-crystalized ligands were docked into all 18 crystal structures applying the three described pharmacophore constraints. Binding sites on the crystal structures were determined using the binding site mode in SeeSAR (version 9.2) by copying all 18 ligands into that mode and picking the amino acids within 6.5 Å distance around the union of all ligand atoms.
Best results were obtained with the second setup in the crystal structures 2ETR, 2V55, 4YVC, 5KTT. The superposition of those was visually inspected and 2V55 was dismissed as it shows big differences to the other structures and hence did not seem to be a good consensus structure. As the remaining three are very similar the electron density was checked using EDIA. 1 Results are shown in Supplementary Figure 1. EDIA represents the electron density by different colors, areas which show high electron density are depicted in blue, whilst areas with low density are highlighted in red. The three remaining crystal structures are all well resolved in the hinge area, but overall 2ETR shows the biggest blue areas, hence is the most trustworthy structure. Due to the good docking results in structure 2ETR (res. 2.6 Å) under application of pharmacophore constraint 2, this setup is used for the chemical space docking Supplementary Figure 1: Electron densities of ROCK-1 crystal structures: PDB ID 2ETR (left), 4YVC (middle), 5KKT (right). Areas with high electron density are depicted in blue, areas with a low density in red.
Supplementary Note 3: Comparison with an alternative ligand-based virtual screening approach When structure-based virtual screening is considered too expensive, multi-step filter cascades are often times considered the only viable alternative. Fast methods are then used to trim down the massive amounts of data and slower methods are used towards the end of the funnel. 2D fingerprint similarity is an extremely fast approach for such filtering. In order to test the performance of this methodology, we collected a set of highly active ROCK1 inhibitors from the Chembl database. We first assessed the similarity of these compounds compared to our hit-set, using ECFP4 fingerprints and the SpaceLight program. 2 The respective highest similarity values of a hit compared to any of the Chembl actives cover a significant range from about 0.3 to 0.6. We then used each one of the Chembl actives in turn to retrieve all compound from the version of REAL Space used here throughout, with a 2D similarity >0.25. These searches resulted in a total of 6 mio compounds. Supplementary Figure 4 shows the distribution of 2D similarities in this set, as well as the similarity to each one of the hits presented herein.
Supplementary Figure 4: distribution of similarities in REAL Space compared to Chembl ROCK1 actives together with the respective highest similarity compared to the hit molecules presented here. Source data are provided as a Source Data file.
Several observations are noteworthy. (1) the number of molecules retrieved this way is on the same order of magnitude as the number of molecules docked in the Chemical Space Docking process, so the computational effort is not significantly lower and the approach is certainly not trivial. (2) the range of similarities is quite wide and the distribution of similarities to the novel hits is quite even. So there is no particular bias search with CHEMBL actives towards higher similarities in the hit-set, which supports the hypothesis that docking helps avoid such bias. (3) the number and range of similarities in the set of Chembl ROCK1 actives is quite large as well. So the alternative approach might be viable here but most certainly not on targets without such a rich body of data, as is available for ROCK1.

Supplementary Note 4: Comparison to Fully Enumerated Docking
In the Chemical Space Docking approach, the search space is significantly pruned by selecting building blocks that dock well to the binding site. The chemical subspace associated with each building block is fully enumerated through its associated reaction and all complementary building block partners. But how do the results compare with those obtained by brute-force, fully enumerated docking?
A docking campaign that enumerates the full library would be extremely time-consuming and expensive, but we conducted two types of experiments to compare our results to exhaustive docking. In the first approach, we randomly sampled 1% of the compounds from all sub-libraries, giving a sample size that is 1% of the total enumerated set and evenly distributed across all reactions in the chemical space. The total number of docked molecules in Chemical Space Docking and the random sample of 1% of the entire space is roughly the same, about 8 million.
We compared the distribution of docking scores from the random sample to those obtained from Chemical Space Docking (as described in the manuscript). Supplementary Figure 5 shows the comparison of the distribution of docking scores of both samples. The vast majority of the compounds sampled from the full library are poor dockers, while Chemical Space Docking significantly enriches for good dockers. This demonstrates that Chemical Space Docking is much more efficient at finding compounds that have high docking scores.
In the second approach, we selected two chemical subspaces (reactions and associated building blocks) from which experimentally validated hits were obtained and whose size reasonably allowed fully enumerated docking: subspaces s38 (~3.8 million compounds) and s270302 (~838 thousand compounds). We enumerated and docked all compounds in these subspaces and compared the distribution of docking scores to those selected by Chemical Space Docking. Supplementary Figures 6 and 7 show the comparisons of the Chemical Space Docking results to the full enumeration of the compounds in subspaces s38 and s270302. For subspace s38 (Supplementary Figure 6) the chemical space docking approach does a very good job retrieving the best dockers from the full-enumeration docking, and its distribution of scores almost matches a very sparsely populated tail of the full distribution. A striking amount of unproductive effort is avoided by limiting the very poor dockers (score <1) to only a few hundred.
Chemical subspace s270302 (Supplementary Figure 7) looks a little different. Again we see a significant waste of effort in the > 95% of the fully enumerated library that fails to achieve a reasonable docking score. A comparison of the distributions of the high scoring compounds shows that, for this subspace, Chemical Space Docking does indeed miss some of the compounds that were identified by docking the fully enumerated library. This results not from a failure to identify an initial building block as a good docking component, but rather from the filtering using the six exclusion criteria, as detailed in the main manuscript. This criteria clearly excluded initial building blocks that led to complete products that docked well, but whose instantiation was never realized because the building blocks were filtered out beforehand.
Supplementary Figure 5: Comparison of the distribution of docking scores from chemical space docking with that obtained from a random sample of 1% of the full chemical space. Docking scores are binned in 1 unit increments of estimated pIC50 values and reported as a histogram. Bars are labeled by the percentage of the compounds in the sampled chemical space. Source data are provided as a Source Data file.